Serveur d'exploration sur la recherche en informatique en Lorraine

Attention, ce site est en cours de développement !
Attention, site généré par des moyens informatiques à partir de corpus bruts.
Les informations ne sont donc pas validées.

Adaptation de la matrice de covariance pour l'apprentissage par renforcement direct

Identifieur interne : 001694 ( Main/Exploration ); précédent : 001693; suivant : 001695

Adaptation de la matrice de covariance pour l'apprentissage par renforcement direct

Auteurs : Olivier Sigaud [France] ; Freek Stulp [France]

Source :

RBID : Pascal:13-0216767

Descripteurs français

English descriptors

Abstract

There has been a recent focus in reinforcement learning on addressing continuous state and action problems by optimizing parameterized policies. PI2 is a recent example of this approach. It combines a derivation from first principles of stochastic optimal control wilh tools from statistical estimation theory. In this paper, we consider PI2 as a member of the wider family of methods which share the concept of probability-weighted averaging to iteratively update parameters to optimize a cost function. We compare PI2 to other members of the same family - the 'Cross-Entropy Method' and 'Covariance Matrix Adaptation - Evolutionary Strategy' - at the conceptual level and in terms of performance. The comparison suggests the derivation of a novel algorithm which we call PI2-CMA for "Path Integral Policy Improvement with Co-variance Matrix Adaptation ". PI2-CMA's main advantage is that it determines the magnitude of the exploration noise automatically. We illustrate this advantage on a non-trivial simulated robotics experiment.


Affiliations:


Links toward previous steps (curation, corpus...)


Le document en format XML

<record>
<TEI>
<teiHeader>
<fileDesc>
<titleStmt>
<title xml:lang="fr" level="a">Adaptation de la matrice de covariance pour l'apprentissage par renforcement direct</title>
<author>
<name sortKey="Sigaud, Olivier" sort="Sigaud, Olivier" uniqKey="Sigaud O" first="Olivier" last="Sigaud">Olivier Sigaud</name>
<affiliation wicri:level="3">
<inist:fA14 i1="01">
<s1>Institut des Systèmes Intelligents et de Robotique Université Pierre et Marie Curie CNRS UMR 7222 4, place Jussieu</s1>
<s2>75252 Paris</s2>
<s3>FRA</s3>
<sZ>1 aut.</sZ>
</inist:fA14>
<country>France</country>
<placeName>
<region type="region" nuts="2">Île-de-France</region>
<settlement type="city">Paris</settlement>
</placeName>
</affiliation>
</author>
<author>
<name sortKey="Stulp, Freek" sort="Stulp, Freek" uniqKey="Stulp F" first="Freek" last="Stulp">Freek Stulp</name>
<affiliation wicri:level="3">
<inist:fA14 i1="02">
<s1>Cognitive Robotics École Nationale Supérieure de Techniques Avancées (ENSTA-ParisTech) 32, Boulevard Victor</s1>
<s2>75015 Paris</s2>
<s3>FRA</s3>
<sZ>2 aut.</sZ>
</inist:fA14>
<country>France</country>
<placeName>
<region type="region" nuts="2">Île-de-France</region>
<settlement type="city">Paris</settlement>
</placeName>
</affiliation>
<affiliation wicri:level="3">
<inist:fA14 i1="03">
<s1>FLOWERS Research Team INRIA Bordeaux Sud-Ouest 351, Cours de la Libération</s1>
<s2>33405 Talence</s2>
<s3>FRA</s3>
<sZ>2 aut.</sZ>
</inist:fA14>
<country>France</country>
<placeName>
<region type="region" nuts="2">Nouvelle-Aquitaine</region>
<region type="old region" nuts="2">Aquitaine</region>
<settlement type="city">Talence</settlement>
</placeName>
</affiliation>
</author>
</titleStmt>
<publicationStmt>
<idno type="wicri:source">INIST</idno>
<idno type="inist">13-0216767</idno>
<date when="2013">2013</date>
<idno type="stanalyst">PASCAL 13-0216767 INIST</idno>
<idno type="RBID">Pascal:13-0216767</idno>
<idno type="wicri:Area/PascalFrancis/Corpus">000061</idno>
<idno type="wicri:Area/PascalFrancis/Curation">000946</idno>
<idno type="wicri:Area/PascalFrancis/Checkpoint">000040</idno>
<idno type="wicri:explorRef" wicri:stream="PascalFrancis" wicri:step="Checkpoint">000040</idno>
<idno type="wicri:doubleKey">0992-499X:2013:Sigaud O:adaptation:de:la</idno>
<idno type="wicri:Area/Main/Merge">001711</idno>
<idno type="wicri:Area/Main/Curation">001694</idno>
<idno type="wicri:Area/Main/Exploration">001694</idno>
</publicationStmt>
<sourceDesc>
<biblStruct>
<analytic>
<title xml:lang="fr" level="a">Adaptation de la matrice de covariance pour l'apprentissage par renforcement direct</title>
<author>
<name sortKey="Sigaud, Olivier" sort="Sigaud, Olivier" uniqKey="Sigaud O" first="Olivier" last="Sigaud">Olivier Sigaud</name>
<affiliation wicri:level="3">
<inist:fA14 i1="01">
<s1>Institut des Systèmes Intelligents et de Robotique Université Pierre et Marie Curie CNRS UMR 7222 4, place Jussieu</s1>
<s2>75252 Paris</s2>
<s3>FRA</s3>
<sZ>1 aut.</sZ>
</inist:fA14>
<country>France</country>
<placeName>
<region type="region" nuts="2">Île-de-France</region>
<settlement type="city">Paris</settlement>
</placeName>
</affiliation>
</author>
<author>
<name sortKey="Stulp, Freek" sort="Stulp, Freek" uniqKey="Stulp F" first="Freek" last="Stulp">Freek Stulp</name>
<affiliation wicri:level="3">
<inist:fA14 i1="02">
<s1>Cognitive Robotics École Nationale Supérieure de Techniques Avancées (ENSTA-ParisTech) 32, Boulevard Victor</s1>
<s2>75015 Paris</s2>
<s3>FRA</s3>
<sZ>2 aut.</sZ>
</inist:fA14>
<country>France</country>
<placeName>
<region type="region" nuts="2">Île-de-France</region>
<settlement type="city">Paris</settlement>
</placeName>
</affiliation>
<affiliation wicri:level="3">
<inist:fA14 i1="03">
<s1>FLOWERS Research Team INRIA Bordeaux Sud-Ouest 351, Cours de la Libération</s1>
<s2>33405 Talence</s2>
<s3>FRA</s3>
<sZ>2 aut.</sZ>
</inist:fA14>
<country>France</country>
<placeName>
<region type="region" nuts="2">Nouvelle-Aquitaine</region>
<region type="old region" nuts="2">Aquitaine</region>
<settlement type="city">Talence</settlement>
</placeName>
</affiliation>
</author>
</analytic>
<series>
<title level="j" type="main">Revue d'intelligence artificielle</title>
<title level="j" type="abbreviated">Rev. intell. artif.</title>
<idno type="ISSN">0992-499X</idno>
<imprint>
<date when="2013">2013</date>
</imprint>
</series>
</biblStruct>
</sourceDesc>
<seriesStmt>
<title level="j" type="main">Revue d'intelligence artificielle</title>
<title level="j" type="abbreviated">Rev. intell. artif.</title>
<idno type="ISSN">0992-499X</idno>
</seriesStmt>
</fileDesc>
<profileDesc>
<textClass>
<keywords scheme="KwdEn" xml:lang="en">
<term>Adaptation</term>
<term>Addressing</term>
<term>Artificial intelligence</term>
<term>Averaging method</term>
<term>Black box</term>
<term>Cost function</term>
<term>Covariance matrix</term>
<term>Entropy</term>
<term>Evolutionary algorithm</term>
<term>Matrix method</term>
<term>Modeling</term>
<term>Optimal control</term>
<term>Optimal control (mathematics)</term>
<term>Optimization</term>
<term>Path integral</term>
<term>Policy</term>
<term>Probabilistic approach</term>
<term>Reinforcement learning</term>
<term>Robotics</term>
<term>Statistical analysis</term>
<term>Statistical estimation</term>
<term>Stochastic control</term>
<term>Updating</term>
<term>Variance</term>
</keywords>
<keywords scheme="Pascal" xml:lang="fr">
<term>Adaptation</term>
<term>Intelligence artificielle</term>
<term>Boîte noire</term>
<term>Apprentissage renforcé</term>
<term>Estimation statistique</term>
<term>Mise à jour</term>
<term>Robotique</term>
<term>Adressage</term>
<term>Politique</term>
<term>Commande stochastique</term>
<term>Commande optimale</term>
<term>Contrôle optimal</term>
<term>Matrice covariance</term>
<term>Modélisation</term>
<term>Optimisation</term>
<term>Approche probabiliste</term>
<term>Méthode moyenne</term>
<term>Analyse statistique</term>
<term>Fonction coût</term>
<term>Entropie</term>
<term>Méthode matricielle</term>
<term>Algorithme évolutionniste</term>
<term>Intégrale parcours</term>
<term>Variance</term>
<term>.</term>
</keywords>
<keywords scheme="Wicri" type="topic" xml:lang="fr">
<term>Intelligence artificielle</term>
<term>Robotique</term>
<term>Politique</term>
</keywords>
</textClass>
</profileDesc>
</teiHeader>
<front>
<div type="abstract" xml:lang="en">There has been a recent focus in reinforcement learning on addressing continuous state and action problems by optimizing parameterized policies. PI
<sup>2</sup>
is a recent example of this approach. It combines a derivation from first principles of stochastic optimal control wilh tools from statistical estimation theory. In this paper, we consider PI
<sup>2</sup>
as a member of the wider family of methods which share the concept of probability-weighted averaging to iteratively update parameters to optimize a cost function. We compare PI
<sup>2</sup>
to other members of the same family - the 'Cross-Entropy Method' and 'Covariance Matrix Adaptation - Evolutionary Strategy' - at the conceptual level and in terms of performance. The comparison suggests the derivation of a novel algorithm which we call PI
<sup>2</sup>
-CMA for "Path Integral Policy Improvement with Co-variance Matrix Adaptation ". PI
<sup>2</sup>
-CMA's main advantage is that it determines the magnitude of the exploration noise automatically. We illustrate this advantage on a non-trivial simulated robotics experiment.</div>
</front>
</TEI>
<affiliations>
<list>
<country>
<li>France</li>
</country>
<region>
<li>Aquitaine</li>
<li>Nouvelle-Aquitaine</li>
<li>Île-de-France</li>
</region>
<settlement>
<li>Paris</li>
<li>Talence</li>
</settlement>
</list>
<tree>
<country name="France">
<region name="Île-de-France">
<name sortKey="Sigaud, Olivier" sort="Sigaud, Olivier" uniqKey="Sigaud O" first="Olivier" last="Sigaud">Olivier Sigaud</name>
</region>
<name sortKey="Stulp, Freek" sort="Stulp, Freek" uniqKey="Stulp F" first="Freek" last="Stulp">Freek Stulp</name>
<name sortKey="Stulp, Freek" sort="Stulp, Freek" uniqKey="Stulp F" first="Freek" last="Stulp">Freek Stulp</name>
</country>
</tree>
</affiliations>
</record>

Pour manipuler ce document sous Unix (Dilib)

EXPLOR_STEP=$WICRI_ROOT/Wicri/Lorraine/explor/InforLorV4/Data/Main/Exploration
HfdSelect -h $EXPLOR_STEP/biblio.hfd -nk 001694 | SxmlIndent | more

Ou

HfdSelect -h $EXPLOR_AREA/Data/Main/Exploration/biblio.hfd -nk 001694 | SxmlIndent | more

Pour mettre un lien sur cette page dans le réseau Wicri

{{Explor lien
   |wiki=    Wicri/Lorraine
   |area=    InforLorV4
   |flux=    Main
   |étape=   Exploration
   |type=    RBID
   |clé=     Pascal:13-0216767
   |texte=   Adaptation de la matrice de covariance pour l'apprentissage par renforcement direct
}}

Wicri

This area was generated with Dilib version V0.6.33.
Data generation: Mon Jun 10 21:56:28 2019. Site generation: Fri Feb 25 15:29:27 2022